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way around OSal] 
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Pary mix: PG, ReEdIS. 
Viemcacneo 


out were nosted on a 
single macnine 
somewnere In LA 
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ess COwe;»ul than my 
VIACBOOK Pro 


Okay, we launched. 
now what? 


ZOK SIGNUOS IN tne Tirst 
daly 


everytning IS on Tire 


pest & worst day of our 
IVES SO Tar 


load was tnrougn tne 
FOOT 


friday rolls around 


Not slowing down 


lets move to EU? 


G4 ulograde to 9.0 
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scaling = replacing al 
components of a car 
wnlle ariving It at 
JOOmon 


[Nis IS TNE Story Of Now 
Our usage of PG Nas 
EVONEC 


Phase 1: All ORIV, al 
tne time 


why og’? at first, Costais. 


/manage. oy syncdb 


ORM mace it too easy to 
Not really think througn 


orimary Keys 


oretty good Tor getting oft 
tne ground 


Viedia.objects.get(ok = 4) 


first Version of Our Teeo 
(Ore-IaUNCN| 


friends = 


Relationships.objects. filter(source_use 
r = user) 


recent_photos = 
Media.objects. filter(user_id__in = 
friends) .order_by( ‘-pk’ )[@:20| 


main teed at launcn 


Redis: 
// user 33 posts 
friends = SMEMBERS followers: 33 
for user in friends: 
LPUSH feed: <user_id> <media_id> 


// for reading 
LRANGE feed:4 0 20 


canonical data: PG 
feeds/lists/sets: Redis 
object cache: memcacne 


oost-launcn 


moved cd to its own 
macnine 


at time, largest tale: 
onoto metacata 


ran master-slave trom tne 
peg inning, with 
streaming replication 


Oackups: stoop tne 
raolica, xiS_Treeze Crives, 
and take ESs snaosnot 


AWS tradeott 


3 early oroolems we nit 
with RG 


| on, [nat setting was 
tne orobleny’? 


shared buffers 


cost_delay 
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3 CONNECTION COOKING 


Wwe USC PaBoUNCeN 


somewnere In this crazy 
Ccouole of montns. 
Conristoone to the rescue 


onotos keot growing ano 
Growing... 


and only 68GB of 
RAI ON Clagest 
macnine In EC? 


SO wnat Now’? 


Phase 2: Verical 
HAITITONING 


glango dio routers make 
tT oretty easy 


def db_for_read(self, model ): 
if app_label == ‘photos’: 
return ‘photodb' 
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your Torelan Key 
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all of those USer/USer_IC 
ntercnangeaole calls porte 
YOU NOW) 


olenty of time Spent In 
MUFouine 


read slaves (USING 
streaming replicas) wnere 
we need to reduce 
contention 


9 Tew monitns later... 


onotosdb > CGOGE 


orecioitated by being on 

Cloud nardware, Out likely 

fo Nave nrt limit eventually 
aitner way 


what now’? 


Nonzontal Partitioning) 


Phase 3: snarcing 


‘surely we || nave nireo 
someone exoenenced 
oeiore we actually need 
fo shard’ 


never true aoout 
Scaling 


| Choosing a metnoo 
2 adaoting the aoplication 
3 expanding capacity 


evaluated solutions 


at tne time, none were 
UO to task Of DEING Our 
orimary De 


NOSQL alternatives 
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Fange/date-based 
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did IN Postgres Itselt 


requirements 


| low operational & Code 
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2 CaSY ExOANCING OTF 
Capacity 


3 low performance 
Impact on aoolication 


scnema-based logical 
snaraing 


many many many 
t{AOUSANAS) Of logical 
snards 


tnat mao to fewer 
onysical ones 


// 8 logical shards on 2 machines 
user_id % 8 = logical shard 


logical shards -> physical shard map 


{ 
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Ow Yr > 
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// 8 logical shards on 2 4 machines 
user_id % 8 = logical shard 


logical shards -> physical shard map 
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scnemas 


all tnat OUDIIC StuiT |C 
oeen glossing over for 2 
years 


— database: 
— schema: 
— table: 
— columns 
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USING Taoric, created 
tnousanas of scnemas 


mach 
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Shard@ 


photos_by_user 


Shard1 


photos_by_user 


Shard2 


photos_by_user 


Shard3 
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photos_by_user 


machineB: 

Shard4 
photos_by_user 

Shardo 
photos_by_user 

Shard6 
photos_by_user 

Shard? 
photos_by_user 


Faloric or similar oaralle! 
task e6xeCUtOr |S 
essential) 


aoplication-side Iogic 


SHARD_TO_DB = {} 


SHARD_TO_DB[Q 
SHARD_TO_DB[4 
SHARD_TO_DB[2 
SHARD_TO_DB[3 
SHARD_TO_DB[4 
SHARD_TO_DB[5 
SHARD_TO_DB[6 
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SHARD_TO_DB [7] 


~> KY KEY FE CE EQ CE CE 


instead of Django ORM, 
wrote really simole co 
aostraction layer 


select/Uocate/insert/ 
delete 


select( fields, table_name, 
Shard_key, where_statement, 
where_parameters ) 


select( fields, table_name, 
shard_key, where_statement, 
where_parameters ) 


Shard_key % num_logical_shards = 
Shard_id 


IN Most cases, user id 
for US 


custom Vjiango test 


runner to create/tear- 
down snaraed DBs 


most Guenes Involve 
Visiting nanatul or snards 
over one OF lwo 
macnines 


T maooing across snards 
on single DB, UNION 
ALL to aggregate 


Cllents to llorary ass In 
((Sshard_key, Ic). 
Snard_key,|c)) etc 


lorary mMaos Sub-selects 
fo eacn snard, and eacn 
macnine 


oarallel execution! (oer 
macnine, at least) 


—> Append (cost=0.0@..973.72 rows=100 width=12) (actual 
time=@.290. .16@.035 rows=30@ loops=1 ) 
—>» Limit (cost=@.00..806.24 rows=30 width=12) (actual 
time=@.288..159.913 rows=14 loops=1 ) 
—>» Index Scan Backward using index on table 
(cost=@.@0. .18651 .04 rows=694 width=12) (actual time=0.286..159.885 
rows=14 loops=1 ) 
—-> Limit (cost=@.00..71.15 rows=30 width=12) (actual 
time=@.Q@15..@.0@18 rows=1 loops=1 ) 
—> Index Scan using index on table (cost=0.00..101.99 rows=43 
width=12) (actual time=@.013..@.014 rows=1 loops=1 ) 
(etc) 


Monday, April 23, 12 


eventually, WOUId be nice 
fo paralielize across 
macnines 


next cnallenge: unique 
Ls 


requirements 


| should oe time 
sortable witnout requiring 
2. |OOKUD 


2 SNOUIG 0e 64-bI 


3 low operational 
COMIPDIeXIty 


surveyed tne options 


ticket servers’? 


UID" 
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twitter snowtlake’? 


a0olcation-level |Ds ala 
Vliongo’’ 


ney, tne alo is already 
oretty good acout 
Nncrementing Seauences 


| 44 bits of time in millis | 
| 43 bits for shard ID | 
| 140 bits sequence ID | 


[ 41 bits of time in millis | 
| 43 bits for shard ID | 
| 140 bits sequence ID | 


| 44 bits of time in millis | 
[ 13 bits for shard ID | 
| 140 bits sequence ID | 


| 
| 


41 bits of time in millis | 
13 bits for shard ID | 
1@ bits sequence ID |] 


CREATE OR REPLACE FUNCTION insta5d.next_id(OUT result bigint) AS $$ 
DECLARE 

Oour_epoch bigint := 1314220021'721 : 

seq_id bigint; 

now_millis bigint; 

Ssnard.id me s= "3; 
BEGIN 

SELECT nextval('instad.table_id_seq') % 1024 INTO seq_id; 


SELECT FLOOR(EXTRACT( EPOCH FROM clock_timestamp()) *« 1@0@) INTO 
now_millis; 


result := (now_millis - our_epoch) << 23; 
result := result | (shard_id << 10); 
result,.:= result: | (Cseq-id): 


END; 
$$ LANGUAGE PLPGSOL: 
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# pulling shard ID from ID: 


shard_id = id *% ((id >> 283) << 23) 
timestamp = EPOCH + id >> 23 
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oros: guaranteed UNniQue 
IN 64-orts, NOt MUCN CT a 


CPU overneac 


cons: large |Ds trom tne 
Gel-Go 


AUNareas OF MIlONs Of 
Ls generated with this 


scneme, no issues 


well wnet aoout ‘re- 
snaraing: 


lirst recourse! OG_reoro 


rewrtes taoles In index 
order 


only requires oriet jocks 
for atomic taole renames 


ZO+GB Savings on 
some Of Our Cos 


PostgreSQL database size - by week 
100G Tt. "VA Ditee Sey Wael) Vou Bees teeget (Sore fuss" (Vaee are rele fice? asee Onn ace ¥ fosee Par usrey Pan) yoree Wood Olea ary View Weta, Bree tee 


YINILIO SOL / 1OO0LAYY 


Size 


10 aE 12 13 14 15 16 a 
Cur: Min: Avg: Max: 
79.916 78.376 89. 286 96.076 
Last update: Wed Apr 18 00:25:14 2012 


Munin 1.4.5 
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especially useful on EC? 


out sometimes you Just 
Nave to resnarc 


streaming reolication to 
tne rescue 


(otw, reomar Is 
AWESOME) 


repmgr standby clone <master> 
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machineA: 
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photos_by_user 

Shard1 
photos_by_user 

Shard2 
photos_by_user 

Shards 
photos_by_user 
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machineA’ : 

Shardd@ 
photos_by_user 

Shard1 
photos_by_user 

Shard2 
photos_by_user 

Shards 
photos_by_user 


machineA: 
Shardd@ 
photos_by_user 
Shard1 
photos_by_user 
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machineC: 


Shard2 
photos_by_user 

Shards 
photos_by_user 


H(SBouncer aostracts 
moving UBs irom tne 


BOO IOQIC 


Can co tnis as long as 
vOU Nave more logica 
snaras than ponysica 
ones 


oeauty of scnemas |S 
tnat tney are onysically 
qiterent les 


No IO hit when deleting, 
No ‘swiss cheese} 


COwNside: requires ~3O 
seconas of maintenance 
to roll out new scnema 


maopoing 


(COUIC be Solved by 
Naving concept oF ‘read- 


only mode tor some 
UBS) 


Not great tor range-scans 
tnat WOuUIG Sean across 
snardas 


atest project: TOllow 
graon 


V1: simole Db table 
(SOUrCe_id, taraet_id. 
Status) 


who do | follow? 
who Tolows me’? 
do | follow x‘? 


does X tallow me’? 


UB was busy, SO We 
Started storing paralle 
version In Redis 


follow_all(GOO Item list) 


inconsistency 


extra IOGIC 


SO mucN extra logic 


EXOOSING yOUr SUODOTrt 
team to tne |dea ot 
cache invalidation 
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reset redis cache 


redesign took a pace 
rom twitters D00K 


Mt can handle tens of 

tnousanas of requests, 

very lant memcacned 
CaCnino 


NeXt SIEDS 


isolating Services to 
minimize ooen conns 


Investigate onysical 
Nardware / etc to reduce 
need to re-snaro 


you dont need to give 
UO PGS durability « 
reatures to snaro 


continue to let the Db do 
wnat tne DB Is great at 


‘cont snara until you 
Nave to 


(Out don't over-estimate 
Now nard tt will oe, elthen 


scaled within constraints 
Of TNE ClOoUC 


MU SUCCESS STOry 


were really excited 
about 9.2) 
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